Mock Exam¶

In [1]:

%matplotlib inline

In [2]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.linalg as la
import scipy.optimize as opt

In [3]:

%load_ext rpy2.ipython

1. (10 points)

Euclid’s algorithm for finding the greatest common divisor of two numbers is

gcd(a, 0) = a
gcd(a, b) = gcd(b, a modulo b)

Write a function to find the greatest common divisor in Python
What is the greatest common divisor of 17384 and 1928?
Write a function to calculate the least common multiple
What is the least common multiple of 17384 and 1928?

2. (10 points)

Using the iris dataset from http://goo.gl/3b3439, answer the following questions:

Find the mean, min and max values of all four measurements (sepal.length, sepal.width, petal.length, petal.width) for each species
Find the average petal.width for rows where the petal.length is less than the sepal.width

3. (10 points)

Find the coordinates of the vector \(\pmatrix{1\\ 2 \\3}\) with respect to the eigenvectors of the following matrix.

array([[ 0.18673654,  0.20037016,  0.47406091],
       [ 0.21715108,  0.44708353,  0.79204575],
       [ 0.24299882,  0.51936745,  0.3061621 ]])

4.(20 points)

Consider the following system of equations:

\[\begin{split}\begin{align*} 2x_1& - x_2& +x_x &=& 6\\ -x_1& +2x_2& - x_3 &=& 2\\ x_1 & -x_2& + x_3 &=& 1 \end{align*}\end{split}\]

Consider the system in matrix form \(Ax=b\) and define \(A\), \(b\) in numpy.
Show that \(A\) is positive-definite
Use the appropriate matrix decomposition function in numpy and back-substitution to solve the system. Remember to use the structure of the problem to determine the appropriate decomposition.

5. (10 points)

The heart dataframe at https://goo.gl/CbJwQM contains information about the survival of patients on the waiting list for the Stanford heart transplant program.

start, stop, event:  Entry and exit time and status for this interval of time
age:                 age-48 years
year:                year of acceptance (in years after 1 Nov 1967)
surgery:             prior bypass surgery 1=yes
transplant:          received transplant 1=yes
id:                  patient id

Answer the following questions with respect to the heart data set:

Sort the data frame by age in descending order (oldest at top) without making a copy
How many patients received a transplant?
What is the average age for transplanted patients under the age of 70?
Find the mean and standard deviation of age for each value of the transplant variable.

6. (10 points)

You are given the following DNA sequecne in FASTA format.

dna = '''> A simulated DNA sequence.
TTAGGCAGTAACCCCGCGATAGGTAGAGCACGCAATCGTCAAGGCGTGCGGTAGGGCTTCCGTGTCTTACCCAAAGAAAC
GACGTAACGTTCCCCGGGCGGTTAAACCAAATCCACTTCACCAACGGCATAACGCGAAGCCCAAACTAAATCGCGCTCGA
GCGGACGCACATTCGCTAGGCTGTGTAGGGGCAGTCTCCGTTAAGGACGATTACCACGTGATGGTAGTTCGCAACATTGG
ACTGTCGGGAATTCCCGAAGGCACTTAAGCGGAGTCTTAGCGTACAGTAACGCAGTCCCGCGTGAACGACTGACAGATGA
'''

Remove the comment line and combine the 4 lines of nucleotide symbols into a single string
Count the frequency of all 16 two-letter combinations in the string.

7. (10 points)

Write a flatmap function that works like map except that the function given takes a list and returns a list of lists that is then flattened (4 points).

In other words, flatmap takes two arguments, a function and a list (or other iterable), just like map. However the function given as the first argument takes a single argument and returns a list (or other iterable). In order to get a simple list back, we need to unravel the resulting list of lists, hence the flatten part.

For example,

flatmap(lambda x: x.split(), ["hello world", "the quick dog"])

should return

["hello", "world", "the", "quick", "dog"]

8. (30 points)

You are given the following set of data to fit a quadratic polynomial to

x = np.arange(10)
y = np.array([  1.58873597,   7.55101533,  10.71372171,   7.90123225,
                -2.05877605, -12.40257359, -28.64568712, -46.39822281,
                -68.15488905, -97.16032044])

Find the least squares solution by using the normal equations \(A^T A \hat{x} = A^T y\). (5 points)
Write your own gradient descent optimization function to find the least squares solution for the coefficients \(\beta\) of a quadratic polynomial. Do not use a gradient descent algorithm from a package such as scipy-optimize or scikit-learn. You can use a simple for loop - start with the parameters beta = np.zeros(3) with a learning rate \(\alpha = 0.0001\) and run for 100000 iterations. (15 points)
Plot the data together with the fitted polynomial. (10 points)

9. (10 points)

Using scipy.optimize, find the values of \(x\) and \(y\) that minimize \(e^{x^2 + y^2}\) in the unconstrained case and in the presence of the constraint that \(x + y = 3\). Use (1,1) as a starting guess.

10. (10 points)

Implement a Python function to find roots using the bisection method. Use it to find solutions to \(x^3 + 4x^2 -3 = x\) between -4 and 1. Do not use the standard library bisect method - the idea is to develop the algorithm using only basic Python language constructs.

11. (10 points)

Let \(f(x)\) be a linear transformation of \(\mathbb{R}^3\) such that

\[\begin{split}\begin{eqnarray*} f(e_1) &=& (1,1,3)\\ f(e_2) &=& (1,0,4)\\ f(e_3) &=& (0,2,1) \end{eqnarray*}\end{split}\]

Find a matrix representation for \(f\).
Compute the matrix representation for \(f\) in the basis

\[\begin{split}\begin{eqnarray*} v_1 &=& (2,3,3)\\ v_2 &=& (8,5,2)\\ v_3 &=& (1,0,5) \end{eqnarray*}\end{split}\]

12. (10 points)

A milkmaid is at point A and needs to get to point B. However, she also needs to fill a pail of water from the river en route from A to B. The equation of the river’s path is shown in the figure below. What is the minimum distance she has to travel to do this?

Solve using scipy.optimize and constrained minimization.
Solve without using scipy.optimize. Hint: Use Lagrange

Milkmaid problem

13. (10 points)

Given the set of vectors

v1 = np.array([1,2,3])
v2 = np.array([2,4,7])
v3 = np.array([1,0,1])

Calculate the pairwise Euclidean distance matrix
Find an orthogonal basis for the space spanned by these vectors without using any functions from numpy.linag or scipy.linalg

14. (10 points)

Find the minimum of the following quadratic function on \(\mathbb{R}^4\)

\[ \begin{align}\begin{aligned}f(x) = x^TAx +b^Tx +c\\where\end{aligned}\end{align} \]

\[\begin{split}A = \left(\begin{matrix}13&5&0&0\\5&7&0&0\\0&0&20&-7\\0&0&-7&12\end{matrix}\right), b = \left(\begin{matrix}1\\1\\1\\1\end{matrix}\right) \textrm {and } c = 2\end{split}\]

and \(x\) is a column vector.

Using scipy.optimize (4 points)
Using a matrix decomposition method (library functions - no need to code your own). Note: for full credit you should exploit matrix structure. (4 points)
Find the minimum under the constraint \(||x||^2 = 1\) (i.e. on the unit sphere in \(\mathbb{R}^4\)). (2 points)

**Note: Do not be overly concerned if your values for \(x\) at the minimum do not match exactly **

15. (10 points)

Given the following covariance matrix

A = np.array([[2,1],[1,4]])

Show that the eigenvectors of \(A\) are orthogonal.
What is the vector representing the first principal component direction?
Find \(A^{-1}\) without performing a matrix inversion.
What are the coordinates of the data points (0, 1) and (1, 1) in the standard basis expressed as coordinates of the principal components?
What is the proportion of variance explained if we keep only the projection onto the first principal component?

16. (10 points)

Find the minimum of the following quadratic function on \(\mathbb{R}^2\)

\[ \begin{align}\begin{aligned}f(x) = x^TAx +b^Tx +c\\where\end{aligned}\end{align} \]

\[\begin{split}A = \left(\begin{matrix}13&5\\5&7\end{matrix}\right), b = \left(\begin{matrix}1\\1\end{matrix}\right) \textrm {and } c = 2\end{split}\]

Under the constraints:

\[g(x) = 2x_1-5x_2=2 \;\;\;\;\;\; \textrm{ and } \;\;\;\;\;\; h(x) = x_1+x_2=1\]

Use a matrix decomposition method to find the minimum of the unconstrained problem without using scipy.optimize (Use library functions - no need to code your own). Note: for full credit you should exploit matrix structure. (3 points)
Find the solution using constrained optimization with the scipy.optimize package. (3 points)
Use Lagrange multipliers and solving the resulting set of equations directly without using scipy.optimize. (4 points)